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Abstract 

Computing the coordinate- wise maxima of a planar point set is a classic and well-studied problem in compu- 
tational geometry. We give an algorithm for this problem in the self-improving setting. We have n (unknown) 
independent distributions 2?i,2?2, ■ • • ,2? n of planar points. An input pointset (pi,p2, ■ ■ ■ ,Pn) is generated 
by taking an independent sample pi from each T>i , so the input distribution T> is the product T>i . A 
self-improving algorithm repeatedly gets input sets from the distribution T> (which is a priori unknown) and 
tries to optimize its running time for T>. Our algorithm uses the first few inputs to learn salient features 
of the distribution, and then becomes an optimal algorithm for distribution T>. Let OPTp denote the ex- 
pected depth of an optimal linear comparison tree computing the maxima for distribution T>. Our algorithm 
eventually has an expected running time of 0(OPTp + n), even though it did not know T> to begin with. 

Our result requires new tools to understand linear comparison trees for computing maxima. We show 
how to convert general linear comparison trees to very restricted versions, which can then be related to the 
running time of our algorithm. An interesting feature of our algorithm is an interleaved search, where the 
algorithm tries to determine the likeliest point to be maximal with minimal computation. This allows the 
running time to be truly optimal for the distribution T>. 
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1 Introduction 

Given a set P of n points in the plane, the maxima problem is to find those points p £ P for which no other 
point in P has a larger ^-coordinate and a larger y-coordinate. More formally, for p £ IR 2 , let x(p) and y(p) 
denote the x and y coordinates of p. Then p' dominates p if and only if x(jpf) > x(p), y{p') > y(p), and one 
of these inequalities is strict. The desired points are those in P that are not dominated by any other points 
in P. The set of maxima is also known as a skyline in the database literature [BKS01] and as a Pareto 
frontier. 

This algorithmic problem has been studied since at least 1975 [KLP75], when Kung et al. described an 
algorithm with an 0(n log n) worst-case time and gave an f2(nlogn) lower bound. Results since then in- 
clude average-case running times of n + 0(n 6 / 7 ) point-wise comparisons [Gol94]; output-sensitive algorithms 
needing 0(n\ogh) time when there are h maxima [KS86]; and algorithms operating in external-memory 
models [GTVV93]. A major problem with worst-case analysis is that it may not reflect the behavior of 
real-world inputs. Worst-case algorithms are tailor-made for extreme inputs, none of which may occur (with 
reasonable frequency) in practice. Average-case analysis tries to address this problem by assuming some 
fixed distribution on inputs; for maxima, the property of coordinate- wise independence covers a broad range 
of inputs, and allows a clean analysis [Buc89], but is unrealistic even so. The right distribution to analyze 
remains a point of investigation. Nonetheless, the assumption of randomly distributed inputs is very natural 
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and one worthy of further research. 

The self-improving model. Ailon et al. introduced the self-improving model to address this issue 
[ACCL06]. In this model, there is some fixed but unknown input distribution T> that generates independent 
inputs, that is, whole input sets P. The algorithm initially undergoes a learning phase, where it processes 
inputs with a worst-case guarantee but tries to learn information about T>. The aim of the algorithm is to 
become optimal for the distribution T>. After seeing some (hopefully small) number of inputs, the algorithm 
shifts into the limiting phase. Now, the algorithm is tuned for T> and the expected running time is (ideally) 
optimal for T>. A self-improving algorithm can be thought of as an algorithm that attains the optimal 
average-case running time for all, or at least a large class of, distributions T>. 

Following earlier self-improving algorithms, we assume the input has a product distribution. An input 
is a set of n points P — (p\,p2, ■ ■ ■ ,p n ) in the plane. Each pi is generated independently from a distribution 
T>i, so the probability distribution of P is the product Hi^i- The T>iS themselves are arbitrary, and the 
only assumption made is their independence. There are lower bounds showing that some restriction on T> is 
necessary for a reasonable self-improving algorithm, as we explain later. 

The first self-improving algorithm was for sorting; this was extended to Delaunay triangulations, with 
these results eventually merged [CS08, ACC + 11]. A self-improving algorithm for planar convex hulls was 
given by Clarkson et al. [CMS10], however their analysis was recently discovered to be flawed. 

1.1 Main result 

Our main result is a self-improving algorithm for planar coordinate- wise maxima over product distributions. 
We need some basic definitions before stating our main theorem. We explain what it means for a maxima 
algorithm to be optimal for a distribution T>. This in turn requires a notion of certificates for maxima, 
which allow the correctness of the output to be verified in 0(n) time. Any procedure for computing maxima 
must provide some "reason" to deem an input point p non-maximal. The simplest certificate would be 
to provide an input point dominating p. Most current algorithms implicitly give exactly such certificates 
[KLP75,Gol94,KS86]. 

Definition 1.1. A certificate 7 has: (i) the sequence of the indices of the maximal points, sorted from left to 
right; (ii) for each non-maximal point, a per-point certificate of non-maximality, which is simply the index 
of an input point that dominates it. We say that a certificate 7 is valid for an input P if 7 satisfies these 
conditions for P. 

The model of computation that we use to define optimality is a linear computation tree that generates 
query lines using the input points. In particular, our model includes the usual CCW-test that forms the 
basis for many geometric algorithms. 

Let £ be a directed line. We use £ + to denote the open halfplane to the left of I and £~ to denote the 
open halfplane to the right of £. 

Definition 1.2. A linear comparison tree T is a binary tree such that each node v of T is labeled with a 
query of the form "p £ t^l". Here p denotes an input point and £ v denotes a directed line. The line £ v can 
be obtained in three ways: (i) it can be a line independent of the input (but dependent on the node v); (ii) it 
can be a line with a slope independent of the input (but dependent on v) passing through a given input point; 
(Hi) it can be a line through an input point and through a point q independent of the input (but dependent 
on v); (iv) it can be the line defined by two distinct input points. A linear comparison tree is restricted if it 
only makes queries of type (i). 

A linear comparison tree T computes the maxima for P if each leaf corresponds to a certificate. This 
means that each leaf v ofTis labeled with a certificate 7 that is valid for every possible input P that reaches 
v. 

Let T be a linear comparison tree and v be a node of T. Note that v corresponds to a region 1Z V C K 2 ™ 
such that an evaluation of T on input P reaches v if and only if P € 1Z V . If T is restricted, then 1Z V is the 
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Fig. 1: Examples of difficult distributions 



Cartesian product of a sequence R2, ■ ■ ■ , of polygonal regions. The depth of v, denoted by d v , is the 
length of the path from the root of T to v. Given T, there exists exactly one leaf v(P) that is reached by the 
evaluation of T on input P. The expected depth of T over T>, d-p(T), is defined as Ep^-p[d v (p\]. Consider 
some comparison based algorithm A that is modeled by such a tree T. The expected depth of T is a lower 
bound on the number of comparisons performed by A. 

Let T be the set of trees that compute the maxima of n points. We define OPT-p = inffgT dj>(T). This 
is a lower bound on the expected time taken by any linear comparison tree to compute the maxima of inputs 
distributed according to T>. We would like our algorithm to have a running time comparable to OPT-p. 

Theorem 1.3. Let e > be a fixed constant and T>i, T>2, ■ ■ ■ ,T> n be independent planar point distributions. 
The input distribution is T> — ]X*2V There is a self-improving algorithm to compute the coordinate-wise 
maxima whose expected time in the limiting phase is 0(e (n + OPTz>)). The learning phase lasts for 0(n e ) 
inputs and the space requirement is 0(n 1+£ ). 

There are lower bounds in [ACC + 11] (for sorters) implying that a self-improving maxima algorithm that 
works for all distributions requires exponential storage, and that the time-space tradeoff (wrt e) in the above 
theorem is optimal. 

Challenges. One might think that since self-improving sorters are known, an algorithm for maxima 
should follow directly. But this reduction is only valid for O(nlogn) algorithms. Consider Figure l(i). The 
distributions 2?i, T>2, ...,T> n / 2 generate the fixed points shown. The remaining distributions generate a 
random point from a line below L. Observe that an algorithm that wishes to sort the x-coordinates requires 
f2(n log n) time. On the other hand, there is a simple comparison tree that determines the maxima in 0{n) 
time. For all pj where j > n/2, the tree simply checks if p n /2 dominates Pj. After that, it performs a linear 
scan and outputs a certificate. 

We stress that even though the points are independent, the collection of maxima exhibits strong depen- 
dencies. In Figure 1(h), suppose a distribution T>i generates either ph or pi; if pg is chosen, we must consider 
the dominance relations among the remaining points, while if ph is chosen, no such evaluation is required. 
The optimal search tree for a distribution D must exploit this complex dependency. 

Indeed, arguing about optimality is one of the key contributions of this work. Previous self-improving 
algorithms employed information-theoretic optimality arguments. These are extremely difficult to analyze 
for settings like maxima, where some points are more important to process that others, as in Figure 1. (The 
main error in the self-improving convex hull paper [CMS10] was an incorrect consideration of dependencies.) 
We focus on a somewhat weaker notion of optimality — linear comparison trees — that nonetheless covers most 
(if not all) important algorithms for maxima. 

In Section 3, we describe how to convert linear comparison trees into restricted forms that use much more 
structured (and simpler) queries. Restricted trees are much more amenable to analysis. In some sense, a re- 
stricted tree decouples the individual input points and makes the maxima computation amenable to separate 
IVoptimal searches. A leaf of a restricted tree is associated with a sequence of polygons (i?i, R 2 , . . . , R n ) 
such that the leaf is visited if and only if every pi £ and conditioned on that event, the Pi remain indepen- 
dent. This independence is extremely important for the analysis. We design an algorithm whose behavior 
can be related to the restricted tree. Intuitively, if the algorithm spends many comparisons involving a single 
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point, then we can argue that the optimal restricted tree must also do the same. We give more details about 
the algorithm in Section 2. 

1.2 Previous work 

Afshani et al. [ABC09] introduced a model of instance- optimality applying to algorithmic problems including 
planar convex hulls and maxima. (However, their model is different from, and in a sense weaker than, the 
prior notion of instance-optimality introduced by Fagin et al. [FLN01].) All previous (e.g., output sensitive 
and instance optimal) algorithms require expected f2(nlogn) time for the distribution given in Figure 1, 
though an optimal self-improving algorithm only requires 0(n) expected time. (This was also discussed 
in [CMS10] with a similar example.), 

We also mention the paradigm of preprocessing regions in order to compute certain geometric structures 
faster (see, e.g., [BLMMll,EMll,HM08,LS10,vKLM10]). Here, we are given a set TZ of planar regions, and 
we would like to preprocess TZ in order to quickly find the (Delaunay) triangulation (or convex hull) for any 
point set which contains exactly one point from each region in TZ. This setting is adversarial, but if we only 
consider point sets where a point is randomly drawn from each region, it can be regarded as a special case 
of our setting. In this view, these results give us bounds on the running time a self-improving algorithm can 
achieve if T> draws its points from disjoint planar regions. 

1.3 Preliminaries and notation 

Before we begin, let us define some basic concepts and agree on a few notational conventions. We use c for 
a sufficiently large constant, and we write log a; to denote the logarithm of x in base 2. All the probability 
distributions are assumed to be continuous. (It is not necessary to do this, but it makes many calculations 
a lot simpler.) 

Given a polygonal region R CM 2 and a probability distribution T> on the plane, we call i a halving line 
for R (with respect to V) if 

Pr \p e e + n R] = Pr \p £ r n R}. 

p~X> p~X> 

Note that if Pr p ~v[p £ R] — 0, every line is a halving line for R. If not, a halving line exactly halves the 
conditional probability for p being in each of the corresponding halfplanes, conditioned on p lying inside R. 

Define a vertical slab structure S as a sequence of vertical lines partitioning the plane into vertical regions, 
called leaf slabs. (We will consider the latter to be the open regions between the vertical lines. Since we 
assume that our distributions are continuous, we abuse notation and consider the leaf slabs to partition the 
plane.) More generally, a slab is the region between any two vertical lines of the S. The size of the slab 
structure is the number of leaf slabs it contains. We denote it by |S|. Furthermore, for any slab S, the 
probability that pi ~ T>i is in S is denoted by q(i, S). 

A search tree T over S is a comparison tree that locates a point within leaf slabs of S. Each internal 
node compares the x-coordinate of the point with a vertical line of S, and moves left or right accordingly. 
We associate each internal node v with a slab S v (any point in S v will encounter v along its search). 

1.4 Tools from self-improving algorithms 

We introduce some tools that were developed in previous self-improving results. The ideas are by and large 
old, but our presentation in this form is new. We feel that the following statements (especially Lemma 1.6) 
are of independent interest. 

We define the notion of restricted searches, introduced in [CMS10]. This notion is central to our final 
optimality proof. (The lemma and formulation as given here are new.) Let U be an ordered set and J- be a 
distribution over U. For any element j £ U, qj is the probability of j according to J- . For any interval S of 
U, the total probability of S is qs- 

We let T denote a search tree over U. It will be convenient to think of T as (at most) ternary, where 
each node has at most 2 children that are internal nodes. In our application of the lemma, U will just be 
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the set of leaf slabs of a slab structure S. We now introduce some definitions regarding restricted searches 
and search trees. 

Definition 1.4. Consider a distribution T and an interval S ofU. An S'-restricted distribution is given by 
the probabilities (for element r G U ) q' r / X^et/ Qj > where the sequence {q'j \j G U} has the following property. 
For each j G S, < q'j < qj. For every other j , q'j = 0. 

Suppose j 6 S. An S'-restricted search is a search for j in T that terminates once j is located in any 
interval contained in S. 

For any sequence of numbers {q'j\j G U} and S C U, we use q' s to denote J2jes 1j- 

Definition 1.5. Let fi G (0, 1) be a parameter. A search tree T over U is pi-reducing if: for any internal 
node S and for any non-leaf child S' of S, qs' < (J-qs- 

A search tree T is c-optimal for restricted searches over T if: for all S and S-restricted distributions 
Ts, the expected time of an S-restricted search over !Fs is at most c(— logg^ + 1). (The probabilities q' are 
as given in Definition 1-4-) 

We give the main lemma about restricted searches. A tree that is optimal for searches over T also works 
for restricted distributions. The proof is given in Appendix A. 

Lemma 1.6. Suppose T is a ^-reducing search tree for J 7 . Then T is 0(1/ log(l/ //))- optimal for restricted 
searches over T . 

We list theorems about data structures that are built in the learning phase. Similar structures were first 
constructed in [ACC + fl], and the following can be proved using their ideas. The data structures involve 
construction of slab structures and specialized search trees for each distribution 2?j. ft is also important 
that these trees can be represented in small space, to satisfy the requirements of Theorem 1.3. The following 
lemmas give us the details of the data structures required. Because this is not a major contribution of this 
paper, we relegate the details to §5. 

Lemma 1.7. We can construct a slab structure S with 0(n) leaf slabs such that, with probability 1 — n 
over the construction of S. the following holds. For a leaf slab A of S. let X\ denote the number of points 
in a random input P that fall into A. For every leaf slab A of S, we have E[A|] = 0(1). The construction 
takes O(logn) rounds and 0(n log 2 n) time. 

Lemma 1.8. Let e > be a fixed parameter. In 0(n E ) rounds and 0(n 1+£ ) time, we can construct search 
trees T\, T 2 , . . ., T n over S such that the following holds, (i) the trees can be represented in 0(n 1+s ) total 
space; (ii) with probability 1 — n~ 3 over the construction of the TjS, every Tj is 0(1/ e)- optimal for restricted 
searches overT>i. 

2 Outline 

We start by providing a very informal overview of the algorithm. Then, we shall explain how the optimality 
is shown. 

If the points of P are sorted by ^-coordinate, the maxima of P can be found easily by a right-to-left 
sweep over P: we maintain the largest y-coordinate Y of the points traversed so far; when a point p is visited 
in the traversal, if y(p) < Y, then p is non-maximal, and the point pj with Y = y(pj) gives a per-point 
certificate for p's non-maximality. If y(p) > Y , then p is maximal, and can be put at the beginning of the 
certificate list of maxima of P. 

This suggests the following approach to a self-improving algorithm for maxima: sort P with a self- 
improving sorter and then use the traversal. The self-improving sorter of [ACC + 11] works by locating each 
point of P within the slab structure S of Lemma 1.7 using the trees of Lemma 1.8. 

While this approach does use S and the TVs, it is not optimal for maxima, because the time spent finding 
the exact sorted order of non-maximal points may be wasted: in some sense, we are learning much more 
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information about the input P than necessary. To deduce the list of maxima, we do not need the sorted 
order of all points of P: it suffices to know the sorted order of just the maxima! An optimal algorithm would 
probably locate the maximal points in S and would not bother locating "extremely non-maximal" points. 
This is, in some sense, the difficulty that output-sensitive algorithms face. 

As a thought experiment, let us suppose that the maximal points of P are known to us, but not in sorted 
order. We search only for these in S and determine the sorted list of maximal points. We can argue that the 
optimal algorithm must also (in essence) perform such a search. We also need to find per-point certificates 
for the non-maximal points. We use the slab structure S and the search trees, but now we shall be very 
conservative in our searches. Consider the search for a point pi. At any intermediate stage of the search, pi 
is placed in a slab S. This rough knowledge of p^s location may already suffice to certify its non-maximality: 
let m denote the leftmost maximal point to the right of S (since the sorted list of maxima is known, this 
information can be easily deduced). We check if m dominates pi. If so, we have a per-point certificate for pi 
and we promptly terminate the search for pi. Otherwise, we continue the search by a single step and repeat. 
We expect that many searches will not proceed too long, achieving a better position to compete with the 
optimal algorithm. 

Non-maximal points that are dominated by many maximal points will usually have a very short search. 
Points that are "nearly" maximal will require a much longer search. So this approach should derive just 
the "right" amount of information to determine the maxima output. But wait! Didn't we assume that the 
maximal points were known? Wasn't this crucial in cutting down the search time? This is too much of an 
assumption, and because the maxima are highly dependent on each other, it is not clear how to determine 
which points are maximal before performing searches. 

The final algorithm overcomes this difficulty by interleaving the searches for sorting the points with 
confirmation of the maximality of some points, in a rough right-to-left order that is a more elaborate version 
of the traversal scheme given above for sorted points. The searches for all points pi (in their respective trees 
Ti) are performed "together", and their order is carefully chosen. At any intermediate stage, each point pi is 
located in some slab Si, represented by some node of its search tree. We choose a specific point and advance 
its search by one step. This order is very important, and is the basis of our optimality. The algorithm is 
described in detail and analyzed in §4. 

Arguing about optimality. A major challenge of self-improving algorithms is the strong requirement 
of optimality for the distribution T>. We focus on the model of linear comparison trees, and let T be an 
optimal tree for distribution V. (There may be distributions where such an exact T does not exist, but we 
can always find one that is near optimal.) One of our key insights is that when D is a product distribution, 
then we can convert T to T 7 , a restricted comparison tree whose expected depth is only a constant factor 
worse. In other words, there exists a near optimal restricted comparison tree that computes the maxima. 

In such a tree, a leaf is labeled with a sequence of regions 1Z — (Ri, R2, ■ ■ ■ , R n )- Any input P = 
(j>i,f>2, ■ ■ • ,Pn) such that pi £ Ri for all i, will lead to this leaf. Since the distributions are independent, we 
can argue that the probability that an input leads to this leaf is ]X Pr p .^x>. [p t £ Ri]. Furthermore, the depth 
of this leaf can be shown to be — J^i l°gPr[pi € Ri]. This gives us a concrete bound that we can exploit. 

It now remains to show that if we start with a random input from 1Z, the expected running time is 
bounded by the sum given above. We will argue that for such an input, as soon as the search for pi locates 
it inside Ri, the search will terminate. This leads to the optimal running time. 

3 The Computational Model and Lower Bounds 
3.1 Reducing to restricted comparison trees 

We prove that when P is generated probabilistically, it suffices to focus on restricted comparison trees. To 
show this, we provide a sequence of transformations, starting from the more general comparison tree, that 
results in a restricted linear comparison tree of comparable expected depth. The main lemma of this section 
is the following. 
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Lemma 3.1. LetT a finite linear comparison tree andT> be a product distribution over points. Then there 
exists a restricted comparison tree T' with expected depth dx>{T') = 0{dx>{T)), as dx>(T) — > oo. 

We will describe a transformation from T into a restricted comparison tree with similar depth. The first 
step is to show how to represent a single comparison by a restricted linear comparison tree, provided that 
P is drawn from a product distribution. The final transformation basically replaces each node of T by the 
subtree given by the next claim. For convenience, we will drop the subscript of T> from d-p, since we only 
focus on a fixed distribution. 

Claim 3.2. Consider a comparison C as described in Definition 1.2, where the comparisons are listed in 
increasing order of simplicity. Let T>' be a product distribution for P such that each Pi is drawn from a 
polygonal region Ri. Then either C is the simplest, type (i) comparison, or there exists a restricted linear 
comparison tree Tq that resolves the comparison C such that the expected depth of Tq ( over the distribution 
T>' ) is 0(1), and all comparisons used in Tq are simpler than C. 

Proof, v is of type (ii). This means that v needs to determine whether an input point pi lies to the left of 
the directed line I through another input point pj with a fixed slope a. We replace this comparison with a 
binary search. Let Rj be the region in V corresponding to pj. Take a halving line l\ for Rj with slope a. 
Then perform two comparisons to determine on which side of l\ the inputs pi and pj lie. If Pi and Pj lie 
on different sides of £±, we declare success and resolve the original comparison accordingly. Otherwise, we 
replace Rj with the appropriate new region and repeat the process until we can declare success. Note that 
in each attempt the success probability is at least 1/4. The resulting restricted tree Tq can be infinite. 
Nonetheless, the probability that an evaluation of T c leads to a node of depth k is at most 2~ n ( k \ so the 
expected depth is O(l). 

v is of type (iii). Here the node v needs to determine whether an input point pi lies to the left of the 
directed line £ through another input point pj and a fixed point q. 

We partition the plane by a constant-sized family of cones, each with apex q, such that for each cone V 
in the family, the probability that line qpj meets V (other than at q) is at most 1/2. Such a family could be 
constructed by a sweeping a line around q, or by taking a sufficiently large, but constant-sized, sample from 
the distribution of pj , and bounding the cones by all lines through q and each point of the sample. Such a 
construction has a non-zero probability of success, and therefore the described family of cones exists. 

We build a restricted tree that locates a point in the corresponding cone. For each cone V, we can 
recursively build such a family of cones (inside V), and build a tree for this structure as well. Repeating for 
each cone, this leads to an infinite restricted tree T c - We search for both pi and pj in Tq. When we locate 
Pi and pj in two different cones of the same family, then comparison between pi and qpj is resolved and the 
search terminates. The probability that they lie in the same cones of a given family is at most 1/2, so the 
probability that the evaluation leads to k steps is at most 2~ nt - k \ 

v is of type (iv). Here the node v needs to determine whether an input point Pi lies to the left of the 
directed line I through input points pj and pk ■ 

We partition the plane by a constant-sized family of triangles and cones, such that for each region V in 
the family, the probability that the line through pj and pk meets V is at most 1/2. Such a family could be 
constructed by taking a sufficiently large random sample of pairs pj and pk and triangulating the arrangement 
of the lines through each pair. Such a construction has a non-zero probability of success, and therefore such 
a family exists. (Other than the source of the random lines used in the construction, this scheme goes back 
at least to [Cla87]; a tighter version, called a cutting, could also be used [Cha93].) 

When computing C, suppose pi is in region V of the family. If the line PjPk docs not meet V, then the 
comparison outcome is known immediately. This occurs with probability at least 1/2. Moreover, determining 
the region containing pi can be done with a constant number of comparisons of type (i), and determining if 
PjPk meets V can be done with a constant number of comparisons of type (iii); for the latter, suppose V is 
a triangle. If pj G V , then pjp~k meets V. Otherwise, suppose pk is above all the lines through pj and each 
vertex of V; then pJpk does not meet V. Also, if pk is below all the lines through pj and each vertex, then 



7 



PjPk does not meet V. Otherwise, pjpk meets V. So a constant number of type (i) and type (iii) queries 
suffice. 

By recursively building a tree for each region V of the family, comparisons of type (iv) can be done 
via a tree whose nodes use comparisons of type (i) and (iii) only. Since the probability of resolving the 
comparison is at least 1/2 with each family of regions that is visited, the expected number of nodes visited 
is constant. □ 

of Lemma 3.1. We transform T into a tree T' that has no comparisons of type (iv), by using the construction 
of Claim 3.2 where nodes of type (iv) are replaced by a tree. We then transform T' into a tree T" that has no 
comparisons of type (iii) or (iv), and finally transform T'" into a restricted tree. Each such transformation 
is done in the same general way, using one case of Claim 3.2, so we focus on the first one. 

We incrementally transform T into the tree T' ■ In each such step, we have a partial restricted comparison 
tree T" that will eventually become T 1 . Furthermore, during the process each node of T is in one of three 
different states. It is either finished, fringe, or untouched. Finally, we have a function S that assigns to each 
finished and to each fringe node of T a subset S(v) of nodes in T" ■ 

The initial situation is as follows: all nodes of T are untouched except for the root which is fringe. 
Furthermore, the partial tree T" consists of a single root node r and the function S assigns the root of T to 
the set {r}. 

Now our transformation proceeds as follows. We pick a fringe node v in T, and mark v as finished. For 
each child v' of v, if v' is an internal node of T, we mark it as fringe. Otherwise, we mark v' as finished. 
Next, we apply Claim 3.2 to each node w £ S(v). Note that this is a valid application of the claim, since 
w is a node of T", a restricted tree. Hence 1Z W is a product set, and the distribution T> restricted to 1Z W is 
a product distribution. Hence, replace each node w £ S(v) in T" by the subtree given by Claim 3.2. Now 
S{v) contains the roots of these subtrees. Each leaf of each such subtree corresponds to an outcome of the 
comparison in v. (Potentially, the subtrees are countably infinite, but the expected number of steps to reach 
a leaf is constant.) For each child v' of v, we define S(v') as the set of all such leaves that correspond to 
the same outcome of the comparison as v' . We continue this process until there are no fringe nodes left. By 
construction, the resulting tree T' is restricted. 

It remains to argue that df — O(d-r). Let v be a node of T. We define two random variables X v and 
Y v . The variable X v is the indicator random variable for the event that the node v is traversed for a random 
input P r*> T>. The variable Y v denotes the number of nodes traversed in T' that correspond to v (i.e., the 
number of nodes needed to simulate the comparison at v, if it occurs). We have dj- = X^gT ^[^L because 
if the leaf corresponding to an input P ~ T> has depth d, exactly d nodes are traversed to reach it. We also 
have d-f> = X„ e 7-E[i / l ,], since each node in T 1 corresponds to exactly one node v in T. Claim 3.3 below 
shows that E[Y V ] — 0(E[X v ]), which completes the proof. □ 

Claim 3.3. E[Y„] < cE[X v ] 

Proof. Note that EfX^] = Pr[X„ = 1] = Pr[P e TZ V ]. Since the sets 1Z W , w £ S(v), partition 1Z V , we can 
write E[Y„] as 

E[Y V | X v = 0}Pt[X v =0] + 

J2 E[Y | P £ 1Z W ] Pr[P £ 1Z W ]. 

wes(v) 

Since Y v = if P <£ TZ V , we have E[Y„ | X v = 0] = and also Pr[P £ K v ] = T, we s(v) Pr i P € 
Furthermore, by Claim 3.2, we have E[Y t , | P £ 1Z W ] < c. The claim follows. □ 

3.2 Entropy-sensitive comparison trees 

Since every linear comparison tree can be made restricted, we can incorporate the entropy of T> into the 
lower bound. For this we define entropy-sensitive trees, which are useful because the depth of a node v is 
related to the probability of the corresponding region 1Z V . 
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Definition 3.4. We call a restricted linear comparison tree entropy-sensitive if each comparison "pi € £ + 7" 
is such that £ is a halving line for the current region Ri . 

Lemma 3.5. Let v be a node in an entropy-sensitive comparison tree, and let 1Z V = Ri x i? 2 x ■ • • x R n . 
Thend v = -J27=i^gPT[R i }. 

Proof. We use induction on the depth of v. For the root r we have d r — 0. Now, let v' be the parent of v. 
Since T is entropy-sensitive, we reach v after performing a comparison with a halving line in v'. This halves 
the measure of exactly one region in 7Z V , so the sum increases by one. □ 

As in Lemma 3.1, we can make every restricted linear comparison tree entropy-sensitive without affecting 
its expected depth too much. 

Lemma 3.6. Let T a restricted linear comparison tree. Then there exists an entropy-sensitive comparison 
tree T' with expected depth df = O(dr). 

Proof. The proof extends the proof of Lemma 3.1, via an extension to Claim 3.2. We can regard a comparison 
against a fixed halving line as simpler than an comparison against an arbitrary fixed line. Our extension of 
Claim 3.2 is the claim that any type (i) node can be replaced by a tree with constant expected depth, as 
follows. A comparison pi £ £ + can be replaced by a sequence of comparisons to halving lines. Similar to the 
reduction for type (ii) comparisons in Claim 3.2, this is done by binary search. That is, let l\ be a halving 
line for Ri parallel to I. We compare pi with I. If this resolves the original comparison, we declare success. 
Otherwise, we repeat the process with the halving line for the new region R[. In each step, the probability 
of success is at least 1/2. The resulting comparison tree has constant expected depth; we now apply the 
construction of Lemma 3.1 to argue that for a restricted tree T there is an entropy-sensitive version T' whose 
expected depth is larger by at most a constant factor. □ 

Recall that OPTp is the expected depth of an optimal linear comparison tree that computes the maxima 
for P ~ D. We now describe how to characterize OPTp in terms of entropy-sensitive comparison trees. We 
first state a simple property that follows directly from the definition of certificates and the properties of 
restricted comparison trees. 

Proposition 3.7. Consider a leaf v of a restricted linear comparison tree T computing the maxima. Let R t 
be the region associated with non-maximal point pi £ P in 1Z V . There exists some region Rj associated with 
an extremal point pj such that every point in Rj dominates every point in Ri . 

We now enhance the notion of a certificate (Definition 1.1) to make it more useful for our algorithm's 
analysis. For technical reasons, we want points to be "well-separated" according to the slab structure S. By 
Prop. 3.7, every non-maximal point is associated with a dominating region. 

Definition 3.8. Let S be a slab structure. A certificate for an input P is called S-labeled if the following 
holds. Every maximal point is labeled with the leaf slab of S containing it. Every non-maximal point is either 
placed in the containing leaf slab, or is separated from a dominating region by a slab boundary. 

We naturally extend this to trees that compute the S-labeled maxima. 

Definition 3.9. A linear comparison tree T computes the S-labeled maxima of P if each leaf v of T is 
labeled with a S-labeled certificate that is valid for every possible input P £ 1Z V . 

Lemma 3.10. There exists an entropy-sensitive comparison tree T computing the S-labeled maxima whose 
expected depth over V is 0(n + OPTp). 
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Proof. Start with an optimal linear comparison tree T' that computes the maxima. At every leaf, we have 
a list M with the maximal points in sorted order. We merge M with the list of slab boundaries of S to label 
each maximal point with the leaf slab of S containing it. We now deal with the non-maximal points. Let Ri 
be the region associated with a non-maximal point Pi, and Rj be the dominating region. Let A be the leaf 
slab containing Rj. Note that the x-projection of Ri cannot extended to the right of A. If there is no slab 
boundary separating Ri from Rj, then Ri must intersect A. With one more comparison, we can place pi 
inside A or strictly to the left of it. All in all, with 0(n) more comparisons than T', we have a tree T" that 
computes the S-labeled maxima. Hence, the expected depth is OPT-p + 0(n). Now we apply Lemmas 3.1 
and 3.6 to T" to get an entropy-sensitive comparison tree T computing the S-labeled maxima with expected 
depth 0(n + OPT p ). □ 



4 The algorithm 

In the learning phase, the algorithm constructs a slab structure S and search trees Tj, as given in Lemmas 1.7 
and 1.8. Henceforth, we assume that we have these data structures, and will describe the algorithm in the 
limiting (or stationary) phase. Our algorithm proceeds by searching progressively each point pi in its tree 
Tj. However, we need to choose the order of the searches carefully. 

At any stage of the algorithm, each point Pi is placed in some slab Si . The algorithm maintains a set 
A of active points. An inactive point is either proven to be non-maximal, or it has been placed in a leaf 
slab. The active points are stored in a data structure L(A). This structure is similar to a heap and supports 
the operations delete, decrease-key, and find-max. The key associated with an active point pi is the right 
boundary of the slab S% (represented as an element of [|S|]). 

We list the variables that the algorithm maintains. The algorithm is initialized with A = P, and each 
Si is the largest slab in S. Hence, all points have key |S|, and we insert all these keys into L(A). 

• A, L(A): the list A of active points stored in data structure L(A). 

• A, B: Let m be the largest key among the active points. Then A is the leaf slab whose right boundary 
is to and B is a set of points located in A. Initially B is empty and m is \S\, corresponding to the +oo 
boundary of the rightmost, infinite, slab. 

• M, p: M is a sorted (partial) list of currently discovered maximal points and p is the leftmost among 
those. Initially M is empty and p is a "null" point that dominates no input point. 

The algorithm involves a main procedure Search, and an auxiliary procedure Update. The procedure 
Search chooses a point and proceeds its search by a single step in the appropriate tree. Occasionally, it will 
invoke Update to change the global variables. The algorithm repeatedly calls Search until L(A) is empty. 
After that, we perform a final call to Update in order to process any points that might still remain in B. 

Search. Let Pi be obtained by performing a find-max in L(A). If the maximum key m in L{A) is less than 
the right boundary of A, we invoke Update. If pi is dominated by p, we delete pi from L(A). If not, we 
advance the search of pi in Tj by a single step, if possible. This updates the slab Si. If the right boundary of 
Si has decreased, we perform the appropriate decrease-key operation on L(A). (Otherwise, we do nothing.) 

Suppose the point pi reaches a leaf slab A. If A = A, we remove pi from L(A) and insert it in B (in time 
0(|£>|)). Otherwise, we leave pi in L(A). 

Update. We sort all the points in B and update the list of current maxima. As Claim 4.1 will show, we 
have the sorted list of maxima to the right of A. Hence, we can append to this list in 0(|f?|) time. We reset 
B = 0, set A to the leaf slab to the left of m, and return. 

We prove some preliminary claims. We state an important invariant maintained by the algorithm, and 
then give a construction for the data structure L(A). 

Claim 4.1. At any time in the algorithm, the maxima of all points to the right of X have been determined 
in sorted order. 
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Proof. The proof is by backward induction on to, the right boundary of A. When to = \S\, then this is 
trivially true. Let us assume it is true for a given value of to, and trace the algorithm's behavior until the 
maximum key becomes smaller than to (which is done in Update). When Search processes a point p with 
a key of to then either (i) the key value decreases; (ii) p is dominated by p; or (iii) p is eventually placed in 
A (whose right boundary is to). In all cases, when the maximum key decreases below to, all points in A are 
either proven to be non-maximal or are in B. By the induction hypothesis, we already have a sorted list of 
maxima to the right of to. The procedure Update will sort the points in B and all maximal points to the 
right of to — 1 will be determined. □ 

Claim 4.2. Suppose there are x find-max operations and y decrease-key operations. We can implement the 
data structure L(A) such that the total time for the operations is 0(n + x + y). The storage requirement is 



Proof. We represent L(A) as an array of lists. For every k £ [|S|], we keep a list of points whose key values 
are k. We maintain to, the current maximum key. The total storage is 0{n). A find-max can trivially be 
done in O(l) time, and an insert is done by adding the element to the appropriate list. A delete is done by 
deleting the element from the list (supposing appropriate pointers are available) . We now have to update the 
maximum. If the list at to is non-empty, no action is required. If it is empty, we check sequentially whether 
the list at to — 1, to — 2, . . . is empty. This will eventually lead to the maximum. To do a decrease-key, we 
delete, insert, and then update the maximum. 

Note that since all key updates are decrease-keys, the maximum can only decrease. Hence, the total 
overhead for scanning for a new maximum is 0{n). □ 

4.1 Running time analysis 

The aim of this section is to prove the following lemma. 
Lemma 4.3. The algorithm runs in 0(n + OPT-p) time. 

We can easily bound the running time of all calls to Update. 
Claim 4.4. The expected time for all calls to Update is 0{n). 

Proof. The total time taken for all calls to Update is at most the time taken to sort points within leaf slabs. 
By Lemma 1.7, this takes expected time 



The important claim is the following, since it allows us to relate the time spent by Search to the 
entropy-sensitive comparison trees. Lemma 4.3 follows directly from this. 

Claim 4.5. Let T be an entropy-sensitive comparison tree computing S-labeled maxima. Consider a leaf v 
labeled with the regions 1Z V — (i?i,i?2, ■ • ■ ,Rn), and let d v denote the depth of v. Conditioned on P € 7Z V , 
the expected running time of Search is 0(n + d v ). 

Proof. For each Ri, let Si be the smallest slab of S that completely contains E4. We will show that the 
algorithm performs at most an Si-restricted search for input P € 1Z V . If p t is maximal, then Ri is contained 
in a leaf slab (this is because the output is S-labeled) . Hence Si is a leaf slab and an ^-restricted search for 
a maximal pi is just a complete search. 

Now consider a non-maximal Pi. By the properties of S-labeled maxima, the associated region Ri is 
either inside a leaf slab or is separated by a slab boundary from the dominating region Rj. In the former 
case, an ^-restricted search is a complete search. In the latter case, we argue that an ^-restricted search 



O(n). 




□ 
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suffices to process pi. This follows from Claim 4.1: by the time an ^-restricted search finishes, all maxima 
to the right of Si have been determined. In particular, we have found pj, and thus p dominates pi. Hence, 
the search for pi will proceed no further. 

The expected search time taken conditioned on P £ 1Z V is the sum (over i) of the conditional expected 
^-restricted search times. Let Si denote the event that Pi € Ri, and £ be the event that P £ 7Z V . We have 
£ = f\ i £i. By the independence of the distributions and linearity of expectation 

E,r [search time] 

n 

= Eg [Si-restricted search time for pi\ 

i=i 

71 

= Ef 4 [^-restricted search time for pi\. 

i=l 

By Lemma 1.6, the time for an ^-restricted search conditioned on pi G Ri is 0(— logPr[pi G Ri] + 1). By 
Lemma 3.5, d v = — logPr[pi € R4], completing the proof. □ 

We can now prove the main lemma. 

of Lemma 4- 3. By Lemma3.10, there exists an entropy-sensitive comparison tree T that computes the S- 
labeled maxima with expected depth 0(OPT + n). According to Claim 4.5, the expected running time of 
Search is 0(OPT + n). Claim 4.4 tells us the expected time for Update is 0(n), and we add these bounds 
to complete the proof. □ 



5 Data structures obtained during the learning phase 

Learning the vertical slab structure S is very similar to to learning the F-list in Ailon et al. [ACC + 11, 
Lemma 3.2]. We repeat the construction and proof for convenience: take the union of the first k — logn 
inputs P\, P2, Pk, and sort those points by x-coordinates. This gives a list Xq, x±, . . . , x n k-i- Take 
the n values xo, Xk, X2k, ■ ■ ■ , ^(n-i)fe- They define the boundaries for S. We recall a useful and well-known 
fact [ACC+11, Claim 3.3]. 

Claim 5.1. Let Z = y\. Zj be a sum of nonnegative random variables such that Zi = 0(1) for all i, 
E[Z] = 0(1), and for all i,j, ~E,[Z z Zj] = E[Zi]E[Zj]. Then E[Z 2 ] = 0(1). 

Now let A be a leaf slab in S. Recall that we denote by X\ the number of points of a random input P 
that end up in A. Using Claim 5.1, we quickly obtain the following lemma. 

Lemma 5.2. With probability 1 — n~ 3 over the construction of S, we have E[A|] = 0(1) for all leaf slabs 
A e S. 

Proof. Consider two values Xi, Xj from the original list. Note that all the other kn — 2 values are independent 
of these two points. For every r {i,j}, let be the indicator random variable for x r € t := [xi, Xj). Let 
Y t = J2 r ■ Since the Y^'s are independent, by Chernoff's bound [ASOO], for any f3 £ (0, 1], 

Pv[Y t < (1 - P)E[Y t ]] < exp(-/3 2 E[r t ]/2). 

With probability at least 1 — n~ 5 , if E[Y t ] > 12 logn, then Y t > logn. By applying the same argument for 
any pair Xi, Xj and taking a union bound over all pairs, with probability at least 1 — n~ 3 the following holds: 
for any pair t, if Y t < logn, then E[Y" t ] < 12 logn. 

For any leaf slab A = [x a k, Xf a +nfc], we have Y\ < logn. Let be the indicator random variable for 
the event that x t - V, lies in A, so that X\ = £\ X^ . Since E[Yx] > (logn-2)E[A A ], we get E[A A ] = 0(1). 
By independence of the X>,'s, for all EfA^A^] = E[X^]E[X^] , so E[A|] = 0(1), by Claim 5.1. □ 
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Lemma 1.7 follows immediately from Lemma 5.2 and the fact that sorting the k inputs Pi, P%, . . ., Pk 
takes 0(n log 2 n) time. After the leaf slabs have been determined, the search trees Ti can be found using 
essentially the same techniques as before [ACC + 11, Section 3.2]. The main idea is to use rf log ri rounds to 
find the first elogn levels of T$, and to use a balanced search tree for searches that need to proceed to a 
deeper level. This only costs a factor of e -1 . We restate Lemma 1.8 for convenience. 

Lemma 5.3. Let e > be a fixed parameter. In 0(n E ) rounds and 0(n 1+E ) time, we can construct search 
trees T\, T 2 , . . ., T n over S such that the following holds. (?) the trees can be totally represented in 0{n 1+£ ) 
space; (ii) probability 1 — n~ 3 over the construction of the TiS: every Ti is O (1 /e)- optimal for restricted 
searches over T>i . 

Proof. Let 6 > be some sufficiently small constant and c be sufficiently large . For k — c5~ 2 n e logn rounds 
and each pi, we record the leaf slab of S that contains it. We break the proof into smaller claims. 

Claim 5.4. Using k inputs, we can compute estimates q(i, S) for each index i and slab S. The following 
guarantee holds (for all i and S) with probability > 1 — 1/n over the choice of the k inputs. If at least 5 logn 
instances of pi fell in S , then q(i, S) G [(1 — S)q(i, S), (1 + S)q(i, S)] 1 . 

Proof. For a slab S, let N(S) be the number of times pi was in S, and let q(i,S) = N(S)/k be the 
empirical probability for this event (q(i,S) is an estimate of q(i,S)). Fix a slab 5*. If q(i,S) < l/2n 6 , 
then by a Chernoff bound we get Pt[N(S) > 5 logn > Wkq(i,S)} < 2- 5lo ^ n = n~ 5 . Furthermore, if 
q(i,S) > l/2n £ , then q(i,S)k > (c/2c5 2 )logn and Pr[N(S) < (1 - S)q(i,S)k] < exp(-g(i, S)S 2 k/A) < n~ 5 
as well as Pr[A r (S') > (1 + 5)q(i,S)k] < exp(—5 2 q(i, S)k/4) < n~ 5 . Thus, by taking a union bound, we get 
that with probability at least 1 — ?i~ 3 for any slab S, if N(S) > 5 logn, then q(i,S) > n~ £ /2 and hence 
q(i,S)€ [(1 -6)q(i,S),(l + S)q(i,S)]. □ 

We will henceforth assume that this claims holds for all i and S. Based on the values q(i, S), we construct 
the search trees. The tree Ti is constructed recursively. We will first create a partial search tree, where some 
searches may end in non-leaf slabs (or, in other words, leaves of the tree may not be leaf slabs). The root 
is the just the largest slab. Given a slab S, we describe how the create the sub-tree of Ti rooted at S. If 
N(S) < 5 logn, then we make S a leaf. Otherwise, we pick a leaf slab A such that for the slab Si consisting 
of all leaf slabs (strictly) to the left of A and the slab S r consisting of all leaf slabs (strictly) to the right of A 
we have q(i, Si) < (2/3)q(i, S) and q(i, S r ) < (2/3)g(i, S). We make A a leaf child of S. Then we recursively 
create trees for Si and S r and attach them as children to S. For any internal node of the tree S, we have 
q(hS) > n e /2, and hence the depth is at most O(elogn). Furthermore, this partial tree is /3-reducing (for 
some constant /3). The partial tree Ti is extended to a complete tree in a simple way. From each T^-leaf that 
is not a leaf slab, we perform a basic binary search for the leaf slab. This yields a tree Ti of depth at most 
(1 + 0(e)) logn. Note that we only need to store the partial Ti tree, and hence the total space is 0(n l+e ). 

Let us construct, as a thought experiment, a related tree T[. Start with the partial T^. For every 
leaf that is not a leaf slab, extend it downward using the true probabilities q(i, S). In other words, let us 
construct the subtree rooted at a new node S in the following manner. We pick a leaf slab A such that 
q(i,Si) < (2/3)q(i, S) and q(i,S r ) < (2/3)q(i, S) (where Si and S r are as defined above). This ensures that 
T[ is /3-reducing. By Lemma 1.6, T[ is 0(l)-optimal for restricted searches over 2?; (we absorb the /3 into 
0(1) for convenience). 

Claim 5.5. The tree Ti is O (1 /e)- optimal for restricted searches. 

Proof. Fix a slab S and an ^-restricted distribution T>g. Let q'(i,X) (for each leaf slab A) be the series of 
values defining V s . Note that q'(i,S) < q(i,S). Suppose q'(i,S) < n~ e l 2 . Then - \ogq'(i,S) > e(logn)/2. 
Since any search in Ti takes at most (l + 0(e)) logn steps, the search time is at most 0(e _1 (— logq'(i, S^-l-l)). 

Suppose q'(i, S) > n~ e l 2 . Consider a single search for some pi. We will classify this search based on the 
leaf of the partial tree that is encountered. By the construction of Ti, any leaf S 1 is either a leaf slab or has 

1 We remind the reader that this the probability that pi g S. 
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the property that q(i, S') < n~ £ /2. The search is of Type 1 if the leaf of the partial tree actually represents 
a leaf slab (and hence the search terminates). The search is of Type 2 (resp. Type 3) if the leaf of the partial 
tree is a slab S is an internal node of Ti and the depth is at least (resp. less than) e(logn)/3. 

When the search is of Type 1, it is identical in both Ti and T[. When the search is of Type 2, it takes 
at e(logn)/3 in T[ and at most (trivially) (1 + 0(e))(logn) in T t . The total number of leaves (that are not 
leaf slabs) of the partial tree at depth less than e(logn)/3 is at most n £ / 3 . The total probability mass of T>i 
inside such leaves is at most n e / z x nT £ jl < n~ 2e / 3 . Since q'(i,S) > nT E l' 1 , in the restricted distribution 
T>s, the probability of a Type 3 search is at most n~ £ / 6 . 

Choose a random p ~ T>s- Let £ denote the event that a Type 3 search occurs. Furthermore, let X p 
denote the depth of the search in Tj and X' p denote the depth in T[. When £ does not occur, we have argued 
that X p < 0{X' /e). Also, Pr(£) < n~ e / 6 . The expected search time is just E[A p ]. By Bayes' rule, 

E[X P ] = Pr(£)E ? [A p ] + Pr(£)E £ [X p ] 

< 0(e- 1 E ? [X;]) + n- £ / 6 (l + 0(e))logn 
E[X' p ] = Pr(g)%[^] + Pr(£)E £ [X p ] 
=s- < E[X' p }/ Pr(£) < 2E[X' p ] 

Combining, the expected search time is 0(e^ 1 ('E[X p ] + 1)). Since T- is 0(l)-optimal for restricted searches, 
T t is 0(e~ ^-optimal. ' □ 

□ 
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A Restricted searches 



Lemma A.l. Given an interval S in U, let J-g be an S-restricted distribution of J- ' . Let T be a fi-reducing 
search tree for J- . Conditioned on j drawn from J : s, the expected time of an S-restricted search in T for j 
is at most (b/ log(l//i))(— log q' s + 1) (for some absolute constant b). 

Now that we may assume that we are comparing against an entropy sensitive comparison tree, we need to 
think about how to make our searches entropy-sensitive. For this we proceed as follows. By Lemma 1.7, we 
have a vertical slab structure S such that each leaf slab contains only constantly many points in expectation. 
Now, for each distribution T>i, we construct an optimal search tree Ti for the leaf slabs of S. The recursion 
continues until Si or S r are empty. The search in Ti proceeds in the obvious way. To find the leaf slab 
containing pi, we begin in at the root and check whether pi is contained in the corresponding leaf slab. If 
yes, the search stop. Otherwise, we branch to the appropriate child and continue. 

Each node in Ti corresponds to a slab in S, and it is easily seen that if a node has depth d, then pi 
is contained in the corresponding slab with probability at most 2~ d . From this, it quickly follows that Tj 
is an asymptotically optimal search tree for T>i. However, below we require a stronger result. Namely, we 
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Fig. 2: (a) The intersections S n V in (i)-(iii) are trivial, the intersections in (iii) and (iv) are 
anchored; (/3) every node of Tj has at most one non-trivial child, except for R. 

need a technical lemma showing how an optimal search tree for some distribution T is also useful for some 
conditional distributions. 

Let U be an ordered set and J 7 be a distribution over U. For any element j <E U, we let pj denote the 
probability of j in J- . For any interval S of U, the total probability of S is p$- 

Let T be a search tree over U with the following properties. For any internal node S and a non-leaf 
child S' , ps> < MPS- As a result, if S has depth k, then ps < p. k . Every internal node of T has at most 2 
internal children and at most 2 children that are leaves. 

Definition A. 2. Given an distribution J- and interval S , an S-restricted distribution IFs is a conditional 
distribution of J- such that i chosen from J~s always falls in S. 

For any ^-restriction J-$ of J-, there exist values p'j with the following properties. For each j 6 S, 
p'j < Pj- For every other j, <pl- = 0. The probability of element j in Fs is p'j/J2 r Pr- Henceforth, we will 
use the primed values to denote the probabilities in Fg- For interval R, we set p' R = J2r£RPr- Suppose we 
perform a search for j € S. This search is called S-restricted if it terminates once we locate j in any interval 
contained in S. 

Lemma A. 3. Given an interval S in U, let J~s be an S-restricted distribution. Conditioned on j drawn 
from J-g, the expected time of an S-restricted search in T for j is 0{— \ogp' s + 1). 

Proof. We bound the number of visited nodes in an S'-restricted search. We will prove, by induction on 
the distance from a leaf, that for all visited nodes V with pv < 1/2, the expected number of visited nodes 
below V is ci + clog(pv/p'y), for constants c, c\. This bound clearly holds for leaves. Moreover, since for V 
at depth k, py < fi k , we have pv < 1/2 for all but the root and at most l/log(l//x) nodes below it on the 
search path. 

We now examine all possible paths down T that an S'-restricted search can lead to. It will be helpful to 
consider the possible ways that S can intersect the nodes (intervals) that are visited in a search. Say that 
the intersection S D V of S with interval V is trivial if it is either empty, S, or V. Say that it is anchored if 
it shares at least one boundary line with S. Suppose S f~l V = V. Then the search will terminate at V, since 
we have certified that j £ S. Suppose S n V = S, so S is contained in V. There can be at most one child 
of V that contains S. If such a child exists, then the search will simply continue to this child. If not, then 
all possible children (to which the search can proceed to) are anchored. The search can possibly continue to 
any child, at most two of which are internal nodes. Suppose V is anchored. Then at most one child of V 
can be anchored with S. Any other child that intersects S must be contained in it. Refer to Figure 2. 

Consider the set of all possible nodes that can be visited by an S'-restricted search (remove all nodes 
that are terminal, i.e., completely contained in S). These form a set of paths, that form some subtree of S. 
In this subtree, there is only one possible node that has two children. This comes from some node R that 
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contains S and has two anchored (non-leaf) children. Every other node of this subtree has a single child. 
Again, refer to Figure 2. 

From the above, it follows that for all visited nodes V with V ^ R, there is at most one child W whose 
intersection with S is neither empty nor W . Let vis(V) be the expected number of nodes visited below V, 
conditioned on V being visited. We have vis(V) < 1 + vis(W)p' w jp' v , using the fact that when a search for 
j shows that it is contained in a node contained in S, the ^-restricted search is complete. 

Claim A. 4. For V, W as above, with py < 1/2, if vis(W) < c\ + c\og(pw /p'w) > ^en for c > c\j log(l//x), 
with fj, £ (0, 1), 

vis(V) < l + clog(pv/p' v ). (1) 
Proof. By hypothesis, using pw < MPy, an d letting j3 := p' w /py < 1, vis(V) is no more than 

1 + (ci + clog(p w /p' w ))p' w /p' v < 1 + (ci + c\og(py/p' w ) + c\og(fi))(3 

= 1 + Cl f3 + clog{py)f3 + c\og(l/p' w )p + clog(M)/3. 

The function a;log(l/a;) is increasing in the range x £ (0,1/2). Hence, p' w \og(l/p' w ) < p' v \og(l/p' v ) for 
p'v — Pv < 1/2- Since ft < 1, we have 

vis(F) < 1 + ci/3 + c\og( Pv ) + clog(l/p' v ) + clog(/x)/3 

= 1 + Cl0g(py/p'y) + /3(C1 + clog(^)) < 1 + c\0g(p V /p' v ), 

for c > ci/log(l//x). □ 

Only a slightly weaker statement can be made for the node R having two nontrivial intersections at child 
nodes i?i and i?2- 

Claim A. 5. For R, R\, Ri as above, if vis(Ri) < c\ + clog(pfj i /p / R ,), fori = 1,2, thenforc> c\j log(l//x) 7 

vis{R) < 1 + c\og(p R /p' R ) + c. 

Proof. We have 

vis(i?) < 1 + vis(Ri)p' Rl /p' R + vis(R 2 )p R Jp R . 
Let (3 := (p' Ri + p' R2 ) /p' R - With the given bounds for vis(Ri), then using p^ < ppn, vis(i?) is bounded by 

i=l,2 

< 1 + aP + c^log(M) + cf3\og( PR ) + c (PrJp'r) log(V^)- 

1=1,2 

The sum takes its maximum value when each p' R . = p' R /2, yielding 

vis(i?) < 1 + Cl [3 + cp \og(p) + cf3 log(p R ) + c[3 \og(2/p' R ) 

< 1 + c\og{p R /p' R ) + P(c x + clog(M)) + clog(2) < 1 + c\og(p R /p' R ) + clog(2), 

for c > c\j log(l//i), as in (1), except for the addition of clog 2 = c. □ 

Now to complete the proof of Lemma 1.6. For the visited nodes below R, we may inductively take 
ci = 1 and c = l/log(l//i), using Claim A. 4. The hypothesis of Claim A. 5 then holds for R. For the 
visited node just above R, we may apply Claim A. 4 with c± = 1 + l/log(l//i) and c > ci/log(l//i). The 
result is that for the node V just above R, vis(V) < 1 + c\og(p\ / p' v ) . This bound holds then inductively 
(with the given value of c) for nodes further up the tree, at least up until the 1 + l/log(l//i) top nodes. 
For the root Q, note that p'q = p' s . Thus the expected number of visited nodes below Q is at most 
l/log(l/V) + 1 + c\og{p Q /p' Q ) = 0(1 - log(p' g )), as desired. □ 
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